Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks
نویسندگان
چکیده
Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests this assumption by analyzing coauthorship networks from five academic fields – biology, computer science, nanoscience, neuroscience, and physics – and an interdisciplinary journal, PNAS. Name instances in datasets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties of coauthorship networks: it deflates the number of unique authors, number of component, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial-based disambiguation due to their common surname and given name initials.
منابع مشابه
Merging error analysis of name disambiguation based on author similarity
Falsely identifying different authors as one is called merging error in the name disambiguation of coauthorship networks. Research on the measurement and distribution of merging errors helps to collect high quality coauthorship networks. In the aspect of measurement, we provide a Bayesian model to measure the errors through author similarity. We illustratively use the model and coauthor similar...
متن کاملMotif-based success scores in coauthorship networks are highly sensitive to author name disambiguation.
Following the work of Krumov et al. [Eur. Phys. J. B 84, 535 (2011)] we revisit the question whether the usage of large citation datasets allows for the quantitative assessment of social (by means of coauthorship of publications) influence on the progression of science. Applying a more comprehensive and well-curated dataset containing the publications in the journals of the American Physical So...
متن کاملMotif - based success scores in coauthorship networks are highly sensitive to author
Following the work of Krumov et al. [Eur. Phys. J. B 84, 535 (2011)] we revisit the question whether the usage of large citation datasets allows for the quantitative assessment of social (by means of coauthorship of publications) influence on the progression of science. Applying a more comprehensive and well-curated dataset containing the publications in the journals of the American Physical So...
متن کاملCommunity Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...
متن کاملLPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring
Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 67 شماره
صفحات -
تاریخ انتشار 2016